117 research outputs found
A Provable Defense for Deep Residual Networks
We present a training system, which can provably defend significantly larger
neural networks than previously possible, including ResNet-34 and DenseNet-100.
Our approach is based on differentiable abstract interpretation and introduces
two novel concepts: (i) abstract layers for fine-tuning the precision and
scalability of the abstraction, (ii) a flexible domain specific language (DSL)
for describing training objectives that combine abstract and concrete losses
with arbitrary specifications. Our training method is implemented in the DiffAI
system
Scalable Certified Segmentation via Randomized Smoothing
We present a new certification method for image and point cloud segmentation
based on randomized smoothing. The method leverages a novel scalable algorithm
for prediction and certification that correctly accounts for multiple testing,
necessary for ensuring statistical guarantees. The key to our approach is
reliance on established multiple-testing correction mechanisms as well as the
ability to abstain from classifying single pixels or points while still
robustly segmenting the overall input. Our experimental evaluation on synthetic
data and challenging datasets, such as Pascal Context, Cityscapes, and
ShapeNet, shows that our algorithm can achieve, for the first time, competitive
accuracy and certification guarantees on real-world segmentation tasks. We
provide an implementation at https://github.com/eth-sri/segmentation-smoothing.Comment: ICML'2
Programmable Synthetic Tabular Data Generation
Large amounts of tabular data remain underutilized due to privacy, data
quality, and data sharing limitations. While training a generative model
producing synthetic data resembling the original distribution addresses some of
these issues, most applications require additional constraints from the
generated data. Existing synthetic data approaches are limited as they
typically only handle specific constraints, e.g., differential privacy (DP) or
increased fairness, and lack an accessible interface for declaring general
specifications. In this work, we introduce ProgSyn, the first programmable
synthetic tabular data generation algorithm that allows for comprehensive
customization over the generated data. To ensure high data quality while
adhering to custom specifications, ProgSyn pre-trains a generative model on the
original dataset and fine-tunes it on a differentiable loss automatically
derived from the provided specifications. These can be programmatically
declared using statistical and logical expressions, supporting a wide range of
requirements (e.g., DP or fairness, among others). We conduct an extensive
experimental evaluation of ProgSyn on a number of constraints, achieving a new
state-of-the-art on some, while remaining general. For instance, at the same
fairness level we achieve 2.3% higher downstream accuracy than the
state-of-the-art in fair synthetic data generation on the Adult dataset.
Overall, ProgSyn provides a versatile and accessible framework for generating
constrained synthetic tabular data, allowing for specifications that generalize
beyond the capabilities of prior work
Prompting Is Programming: A Query Language For Large Language Models
Large language models have demonstrated outstanding performance on a wide
range of tasks such as question answering and code generation. On a high level,
given an input, a language model can be used to automatically complete the
sequence in a statistically-likely way. Based on this, users prompt these
models with language instructions or examples, to implement a variety of
downstream tasks. Advanced prompting methods can even imply interaction between
the language model, a user, and external tools such as calculators. However, to
obtain state-of-the-art performance or adapt language models for specific
tasks, complex task- and model-specific programs have to be implemented, which
may still require ad-hoc interaction.
Based on this, we present the novel idea of Language Model Programming (LMP).
LMP generalizes language model prompting from pure text prompts to an intuitive
combination of text prompting and scripting. Additionally, LMP allows
constraints to be specified over the language model output. This enables easy
adaption to many tasks, while abstracting language model internals and
providing high-level semantics. To enable LMP, we implement LMQL (short for
Language Model Query Language), which leverages the constraints and control
flow from an LMP prompt to generate an efficient inference procedure that
minimizes the number of expensive calls to the underlying language model. We
show that LMQL can capture a wide range of state-of-the-art prompting methods
in an intuitive way, especially facilitating interactive flows that are
challenging to implement with existing high-level APIs. Our evaluation shows
that we retain or increase the accuracy on several downstream tasks, while also
significantly reducing the required amount of computation or cost in the case
of pay-to-use APIs (13-85% cost savings).Comment: 21 pages + Appendi
- …